Search CORE

13 research outputs found

A Review and Evaluation of Elastic Distance Functions for Time Series Clustering

Author: Bagnall Anthony
Holder Chris
Middlehurst Matthew
Publication venue
Publication date: 26/04/2023
Field of study

Time series clustering is the act of grouping time series data without recourse to a label. Algorithms that cluster time series can be classified into two groups: those that employ a time series specific distance measure; and those that derive features from time series. Both approaches usually rely on traditional clustering algorithms such as

k

-means. Our focus is on distance based time series that employ elastic distance measures, i.e. distances that perform some kind of realignment whilst measuring distance. We describe nine commonly used elastic distance measures and compare their performance with k-means and k-medoids clustering. Our findings are surprising. The most popular technique, dynamic time warping (DTW), performs worse than Euclidean distance with k-means, and even when tuned, is no better. Using k-medoids rather than k-means improved the clusterings for all nine distance measures. DTW is not significantly better than Euclidean distance with k-medoids. Generally, distance measures that employ editing in conjunction with warping perform better, and one distance measure, the move-split-merge (MSM) method, is the best performing measure of this study. We also compare to clustering with DTW using barycentre averaging (DBA). We find that DBA does improve DTW k-means, but that the standard DBA is still worse than using MSM. Our conclusion is to recommend MSM with k-medoids as the benchmark algorithm for clustering time series with elastic distance measures. We provide implementations in the aeon toolkit, results and guidance on reproducing results on the associated GitHub repository

arXiv.org e-Print Archive

A tale of two toolkits, report the third: on the usage and performance of HIVE-COTE v1.0

Author: Bagnall Anthony
Flynn Michael
Large James
Lines Jason
Middlehurst Matthew
Publication venue
Publication date: 25/04/2020
Field of study

The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is a heterogeneous meta ensemble for time series classification. Since it was first proposed in 2016, the algorithm has undergone some minor changes and there is now a configurable, scalable and easy to use version available in two open source repositories. We present an overview of the latest stable HIVE-COTE, version 1.0, and describe how it differs to the original. We provide a walkthrough guide of how to use the classifier, and conduct extensive experimental evaluation of its predictive performance and resource usage. We compare the performance of HIVE-COTE to three recently proposed algorithms

arXiv.org e-Print Archive

The Canonical Interval Forest {(CIF)} Classifier for Time Series Classification

Author: Bagnall Tony
Large James
Middlehurst Matthew
Publication venue: IEEE Conference Publications
Publication date: 20/08/2020
Field of study

Time series classification (TSC) is home to a number of algorithm groups that utilise different kinds of discriminatory patterns. One of these groups describes classifiers that predict using phase dependant intervals. The time series forest (TSF) classifier is one of the most well known interval methods, and has demonstrated strong performance as well as relative speed in training and predictions. However, recent advances in other approaches have left TSF behind. TSF originally summarises intervals using three simple summary statistics. The `catch22' feature set of 22 time series features was recently proposed to aid time series analysis through a concise set of diverse and informative descriptive characteristics. We propose combining TSF and catch22 to form a new classifier, the Canonical Interval Forest (CIF). We outline additional enhancements to the training procedure, and extend the classifier to include multivariate classification capabilities. We demonstrate a large and significant improvement in accuracy over both TSF and catch22, and show it to be on par with top performers from other algorithmic classes. By upgrading the interval-based component from TSF to CIF, we also demonstrate a significant improvement in the hierarchical vote collective of transformation-based ensembles (HIVE-COTE) that combines different time series representations. HIVE-COTE using CIF is significantly more accurate on the UCR archive than any other classifier we are aware of and represents a new state of the art for TSC

arXiv.org e-Print Archive

University of East Anglia digital repository

Unsupervised Feature Based Algorithms for Time Series Extrinsic Regression

Author: Arcencio Guilherme
Bagnall Anthony
Guijo-Rubio David
Middlehurst Matthew
Silva Diego Furtado
Publication venue
Publication date: 02/05/2023
Field of study

Time Series Extrinsic Regression (TSER) involves using a set of training time series to form a predictive model of a continuous response variable that is not directly related to the regressor series. The TSER archive for comparing algorithms was released in 2022 with 19 problems. We increase the size of this archive to 63 problems and reproduce the previous comparison of baseline algorithms. We then extend the comparison to include a wider range of standard regressors and the latest versions of TSER models used in the previous study. We show that none of the previously evaluated regressors can outperform a regression adaptation of a standard classifier, rotation forest. We introduce two new TSER algorithms developed from related work in time series classification. FreshPRINCE is a pipeline estimator consisting of a transform into a wide range of summary features followed by a rotation forest regressor. DrCIF is a tree ensemble that creates features from summary statistics over random intervals. Our study demonstrates that both algorithms, along with InceptionTime, exhibit significantly better performance compared to the other 18 regressors tested. More importantly, these two proposals (DrCIF and FreshPRINCE) models are the only ones that significantly outperform the standard rotation forest regressor.Comment: 19 pages, 21 figures, 6 tables. Appendix include

arXiv.org e-Print Archive

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

Author: Bagnall Anthony
Flynn Michael
Large James
Middlehurst Matthew
Pasos Ruiz Alejandro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2021
Field of study

Time Series Classification (TSC) involves building predictive models for a discrete target variable from ordered, real valued, attributes. Over recent years, a new set of TSC algorithms have been developed which have made significant improvement over the previous state of the art. The main focus has been on univariate TSC, i.e. the problem where each case has a single series and a class label. In reality, it is more common to encounter multivariate TSC (MTSC) problems where the time series for a single case has multiple dimensions. Despite this, much less consideration has been given to MTSC than the univariate case. The UCR archive has provided a valuable resource for univariate TSC, and the lack of a standard set of test problems may explain why there has been less focus on MTSC. The UEA archive of 30 MTSC problems released in 2018 has made comparison of algorithms easier. We review recently proposed bespoke MTSC algorithms based on deep learning, shapelets and bag of words approaches. If an algorithm cannot naturally handle multivariate data, the simplest approach to adapt a univariate classifier to MTSC is to ensemble it over the multivariate dimensions. We compare the bespoke algorithms to these dimension independent approaches on the 26 of the 30 MTSC archive problems where the data are all of equal length. We demonstrate that four classifiers are significantly more accurate than the benchmark dynamic time warping algorithm and that one of these recently proposed classifiers, ROCKET, achieves significant improvement on the archive datasets in at least an order of magnitude less time than the other three

University of East Anglia digital repository

HIVE-COTE 2.0: a new meta ensemble for time series classification

Author: Bagnall Anthony
Bostrom Aaron
Flynn Michael
Large James
Lines Jason
Middlehurst Matthew
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/04/2021
Field of study

The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is a heterogeneous meta ensemble for time series classification. HIVE-COTE forms its ensemble from classifiers of multiple domains, including phase-independent shapelets, bag-of-words based dictionaries and phase-dependent intervals. Since it was first proposed in 2016, the algorithm has remained state of the art for accuracy on the UCR time series classification archive. Over time it has been incrementally updated, culminating in its current state, HIVE-COTE 1.0. During this time a number of algorithms have been proposed which match the accuracy of HIVE-COTE. We propose comprehensive changes to the HIVE-COTE algorithm which significantly improve its accuracy and usability, presenting this upgrade as HIVE-COTE 2.0. We introduce two novel classifiers, the Temporal Dictionary Ensemble and Diverse Representation Canonical Interval Forest, which replace existing ensemble members. Additionally, we introduce the Arsenal, an ensemble of ROCKET classifiers as a new HIVE-COTE 2.0 constituent. We demonstrate that HIVE-COTE 2.0 is significantly more accurate on average than the current state of the art on 112 univariate UCR archive datasets and 26 multivariate UEA archive datasets

arXiv.org e-Print Archive

University of East Anglia digital repository

Identification of novel risk loci, causal insights, and heritable risk for Parkinson's disease: a meta-analysis of genome-wide association studies

Author: Adarmes-Gómez Astrid D
Agee Michelle
Aguilar Miquel
Aitkulova Akbota
Akhmetzhanov Vadim
Alcalay Roy N
Alipanahi Babak
Alvarez Ignacio
Alvarez Victoria
Anderson Tim
Andreassen Ole A
Auton Adam
Bandres-Ciga Sara
Bandres-Ciga Sara
Bangale Tushar
Barrero Francisco Javier
Bell Robert K
Bentley Steven
Bergareche Yarza Jesús Alberto
Bernal-Bernal Inmaculada
Billingsley Kimberley
Blauwendraat Cornelis
Blauwendraat Cornelis
Blazquez Marta
Bonilla-Toribio Marta
Botia Juan A
Botía Juan A
Boungiorno María Teresa
Bras Jose
Bras Jose
Brice Alexis
Brice Alexis
Brockmann Kathrin
Bryc Katarzyna
Bubb Vivien
Buiza-Rueda Dolores
Carrillo Fátima
Carrión-Claro Mario
Cerdan Debora
Chang Diana
Chelban Viorica
Clarimón Jordi
Clarke Carl
Compta Yaroslau
Cookson Mark R
Corvol Jean-Christophe
Corvol Jean-Christophe
Craig David W
Cámara Ana
Dalrymple-Alford John
Danjou Fabrice
Diez-Fairen Monica
Dols-Icardo Oriol
Duarte Jacinto
Duran Raquel
Elson Sarah L
Escamilla-Sevilla Francisco
Escott-Price Valentina
Ezquerra Mario
Faghri Faraz
Faghri Faraz
Feliz Cici
Fernández Manel
Fernández-Santiago Rubén
Finkbeiner Steven
Foltynie Thomas
Fontanillas Pierre
Fowdar Javed
Furlotte Nicholas A
Gan-Or Ziv
Gan-Or Ziv
Garcia Ciara
García-Ruiz Pedro
Gasser Thomas
Gasser Thomas
Gibbs J Raphael
Gibbs J Raphael
Gomez Heredia Maria Jose
Gonzalez-Aramburu Isabel
González Manuel Menéndez
Graham Robert R
Gratten Jacob
Gratten Jacob
Guelfi Sebastian
Guerreiro Rita
Gómez-Garre Pilar
Halliday Glenda
Hardy John
Hardy John A
Hassin-Baer Sharon
Heilbron Karl
Henders Anjali K
Hernandez Dena G
Hernandez Dena G
Heutink Peter
Heutink Peter
Hickie Ian
Hicks Barry
Hinds David A
Hoenicka Janet
Holmans Peter
Houlden Henry
Huber Karen E
Infante Jon
Iwaki Hirotaka
Iwaki Hirotaka
Jankovic Joseph
Jesús Silvia
Jewett Ethan M
Jiang Yunxuan
Jimenez-Escrig Adriano
Kaishybayeva Gulnaz
Kaiyrzhanov Rauan
Karimova Altynay
Kassam Irfahan
Kennedy Martin
Kia Demis A
Kia Demis A
Kinghorn Kerri J
Kleinman Aaron
Koks Sulev
Krohn Lynne
Krohn Lynne
Kulisevsky Jaime
Kwok John
Labrador-Espinosa Miguel A
Leonard Hampton
Leonard Hampton L
Lesage Suzanne
Lesage Suzanne
Lewis Patrick
Lewis Simon
Lin Keng-Han
Litterman Nadia K
Lopez-Sendon Jose Luis
Lovering Ruth
Lubbe Steven
Lungu Codrin
Macias Daniel
Majamaa Kari
Majamaa Kari
Manzoni Claudia
Marinus Johan
Marti Maria Jose
Martinez Maria
Martinez Maria
Martínez Torres Irene
Martínez-Castrillo Juan Carlos
Marín Juan
Mata Marina
McCreight Jennifer C
McIntyre Matthew H
McManus Kimberly F
Mellick George
Mencacci Niccolo E
Middlehurst Ben
Mir Pablo
Mok Kin Y
Montgomery Grant
Morris Huw R
Morris Huw R
Mountain Joanna L
Muñoz Esteban
Méndez-del-Barrio Carlota
Mínguez Adolfo
Nalls Mike A
Nalls Mike A
Narendra Derek
Noblin Elizabeth S
Northover Carrie AM
Noyce Alastair J
Noyce Alastair J
Ojo Oluwadamilola O
Okubadejo Njideka U
Pagola Ana Gorostidi
Pastor Pau
Pearson John
Perez Errazquin Francisco
Periñán-Tocino Teresa
Pihlstrom Lasse
Pihlstrøm Lasse
Pitcher Toni
Pitts Steven J
Plun-Favreau Helene
Poznik G David
Quinn John
R'Bibo Lea
Reed Xylena
Reich Stephen
Rezola Elisabet Mondragon
Rizig Mie
Rizzu Patrizia
Robak Laurie
Rodriguez Antonio Sanchez
Rouleau Guy A
Ruiz-Martínez Javier
Ruz Clara
Ryten Mina
Sadykova Dinara
Sathirapongsasuti J Fah
Savitt Joseph
Scholz Sonja W
Scholz Sonja W
Schreglmann Sebastian
Schulte Claudia
Schulte Claudia
Sharma Manu
Sharma Manu
Shashkin Chingiz
Shelton Janie F
Shringarpure Suyash
Shulman Joshua M
Shulman Joshua M
Shulman Lisa M
Shulman Lisa M
Sidorenko Julia
Sierra María
Siitonen Ari
Siitonen Ari
Silburn Peter A
Simón-Sánchez Javier
Simón-Sánchez Javier
Singleton Andrew B
Singleton Andrew B
Suarez-Sanmartin Esther
Sutherland Margaret
Taba Pille
Tabernero Cesar
Tan Manuela
Tan Manuela X
Tartari Juan Pablo
Tejera-Parrado Cristina
Tian Chao
Tienari Pentti
Toft Mathias
Toft Mathias
Tolosa Eduard
Trabzuni Daniah
Tung Joyce
Vacic Vladimir
Valldeoriola Francesc
Vallerga Costanza L
Vallerga Costanza L
van Hilten Jacobus J
Van Keuren-Jensen Kendall
Vargas-González Laura
Vela Lydia
Visscher Peter M
Visscher Peter M
Vives Francisco
von Coelln Rainer
von Coelln Rainer
Wallace Leanne
Wang Xin
Williams Nigel
Wilson Catherine H
Wood Nicholas W
Wood Nicholas W
Wray Naomi R
Xue Angli
Xue Angli
Yang Jian
Yang Jian
Ylikotila Pauli
Young Emily
Zhang Futao
Zharkinbekova Nazira
Zharmukhanov Zharkyn
Zholdybayeva Elena
Zimprich Alexander
Publication venue: 'Elsevier BV'
Publication date: 01/12/2019
Field of study

Background Genome-wide association studies (GWAS) in Parkinson's disease have increased the scope of biological knowledge about the disease over the past decade. We aimed to use the largest aggregate of GWAS data to identify novel risk loci and gain further insight into the causes of Parkinson's disease. Methods We did a meta-analysis of 17 datasets from Parkinson's disease GWAS available from European ancestry samples to nominate novel loci for disease risk. These datasets incorporated all available data. We then used these data to estimate heritable risk and develop predictive models of this heritability. We also used large gene expression and methylation resources to examine possible functional consequences as well as tissue, cell type, and biological pathway enrichments for the identified risk factors. Additionally, we examined shared genetic risk between Parkinson's disease and other phenotypes of interest via genetic correlations followed by Mendelian randomisation. Findings Between Oct 1, 2017, and Aug 9, 2018, we analysed 7·8 million single nucleotide polymorphisms in 37 688 cases, 18 618 UK Biobank proxy-cases (ie, individuals who do not have Parkinson's disease but have a first degree relative that does), and 1·4 million controls. We identified 90 independent genome-wide significant risk signals across 78 genomic regions, including 38 novel independent risk signals in 37 loci. These 90 variants explained 16–36% of the heritable risk of Parkinson's disease depending on prevalence. Integrating methylation and expression data within a Mendelian randomisation framework identified putatively associated genes at 70 risk signals underlying GWAS loci for follow-up functional studies. Tissue-specific expression enrichment analyses suggested Parkinson's disease loci were heavily brain-enriched, with specific neuronal cell types being implicated from single cell data. We found significant genetic correlations with brain volumes (false discovery rate-adjusted p=0·0035 for intracranial volume, p=0·024 for putamen volume), smoking status (p=0·024), and educational attainment (p=0·038). Mendelian randomisation between cognitive performance and Parkinson's disease risk showed a robust association (p=8·00 × 10−7). Interpretation These data provide the most comprehensive survey of genetic risk within Parkinson's disease to date, to the best of our knowledge, by revealing many additional Parkinson's disease risk loci, providing a biological context for these risk factors, and showing that a considerable genetic component of this disease remains unidentified. These associations derived from European ancestry datasets will need to be followed-up with more diverse data. Funding The National Institute on Aging at the National Institutes of Health (USA), The Michael J Fox Foundation, and The Parkinson's Foundation (see appendix for full list of funding sources)

University of Liverpool Repository